41 research outputs found

    Image-based Automated Chemical Database Annotation with Ensemble of Machine-Vision Classifiers

    Full text link
    This paper presents an image-based annotation strategy for automated annotation of chemical databases. The proposed strategy is based on the use of a machine vision-based classifier for extracting a 2D chemical structure diagram in research articles and converting them into standard chemical file formats, a virtual Chemical Expert" system for screening the converted structures based on the level of estimated conversion accuracy, and a fragment-based measure for calculation intermolecular similarity. In particular, in order to overcome limited accuracies of individual machine-vision classifier, inspired by ensemble methods in machine learning, it is attempted to use of the ensemble of machine-vision classifiers. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. Annotation test to link 121 journal articles to entries in PubChem database demonstrates that ensemble approach increases the coverage of annotation, while keeping the annotation quality (e.g., recall and precision rates) comparable to using a single machine-vision classifier.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/87266/4/Saitou55.pd

    Automated extraction of chemical structure information from digital raster images

    Get PDF
    Background: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results: This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader -- a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion: The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90875/1/Saitou8.pd

    A Rational Approach to Personalized Anticancer Therapy: Chemoinformatic Analysis Reveals Mechanistic Gene-Drug Associations

    Full text link
    Purpose . To predict the response of cells to chemotherapeutic agents based on gene expression profiles, we performed a chemoinformatic study of a set of standard anticancer agents assayed for activity against a panel of 60 human tumor-derived cell lines from the Developmental Therapeutics Program (DTP) at the National Cancer Institute (NCI).Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/41497/1/11095_2004_Article_465512.pd

    Subcellular Drug Distribution Data Set

    Full text link
    NIH RO1GM078200http://deepblue.lib.umich.edu/bitstream/2027.42/84659/1/Bioaccumulation_and_Biodistribution_Knowledgebase.xl

    HeLa cells incubated with styryl compound H46

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/60000/1/H46.ta

    HeLa cells incubated with styryl compound E164

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/59612/1/E164.ta

    HeLa cells incubated with styryl compound H8

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/59962/1/H8.ta

    HeLa cells incubated with styryl compound C143

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/59254/1/C143.ta

    HeLa cells incubated with styryl compound G108

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/59894/1/G108.ta

    HeLa cells incubated with styryl compound B93

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/59036/1/B93.ta
    corecore